Learning to Prompt for Vision-Language Models

نویسندگان

چکیده

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation is based mostly on discretized labels, pre-training aligns images and texts common feature space, which allows zero-shot transfer to task via prompting, i.e., classification weights synthesized natural language describing classes interest. In this work, we show major challenge for deploying such practice prompt engineering, requires domain expertise extremely time-consuming—one needs spend significant amount time words tuning since slight change wording could huge impact performance. Inspired by recent advances research processing (NLP), propose Context Optimization (CoOp), simple approach specifically adapting CLIP-like image recognition. Concretely, CoOp prompt’s context with learnable vectors while entire parameters kept fixed. To handle different recognition tasks, provide two implementations CoOp: unified class-specific context. Through extensive experiments 11 datasets, demonstrate as few one or shots beat hand-crafted prompts decent margin able gain improvements over engineering more shots, e.g., 16 average around 15% (with highest reaching 45%). Despite being learning-based approach, achieves superb generalization performance compared model using prompts.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

a new approach to credibility premium for zero-inflated poisson models for panel data

هدف اصلی از این تحقیق به دست آوردن و مقایسه حق بیمه باورمندی در مدل های شمارشی گزارش نشده برای داده های طولی می باشد. در این تحقیق حق بیمه های پبش گویی بر اساس توابع ضرر مربع خطا و نمایی محاسبه شده و با هم مقایسه می شود. تمایل به گرفتن پاداش و جایزه یکی از دلایل مهم برای گزارش ندادن تصادفات می باشد و افراد برای استفاده از تخفیف اغلب از گزارش تصادفات با هزینه پائین خودداری می کنند، در این تحقیق ...

15 صفحه اول

Learning Articulated Object Models from Language and Vision

In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Kinematic models provide a concise representation of these objects that enable deliberate, generalizable manipulation policies. However, existing approaches to learning these models rely upon visual observations of an obj...

متن کامل

willingness to communicate in the iranian context: language learning orientation and social support

why some learners are willing to communicate in english, concurrently others are not, has been an intensive investigation in l2 education. willingness to communicate (wtc) proposed as initiating to communicate while given a choice has recently played a crucial role in l2 learning. it was hypothesized that wtc would be associated with language learning orientations (llos) as well as social suppo...

Learning Inference Models for Computer Vision

Computer vision can be understood as the ability to perform inference on image data. Breakthroughs in computer vision technology are often marked by advances in inference techniques, as even the model design is often dictated by the complexity of inference in them. This thesis proposes learning based inference schemes and demonstrates applications in computer vision. We propose techniques for i...

متن کامل

Teaching approaches to Computer Assisted Language Learning

Computers have been used for language teaching ever since the 1960's.Learning a second language is a challenging endeavor, and, for decades now, proponents of computer assisted language learning (CALL) have declared that help is on the horison. We investigate the suitability of deploying speech technology in computer based systems that can be used to teach foreign language skills. In this case,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Vision

سال: 2022

ISSN: ['0920-5691', '1573-1405']

DOI: https://doi.org/10.1007/s11263-022-01653-1